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l.Q  Introduction 

This  study  considers  the  problem  of  estimating  the  number  of 
errors  In  a  software  package  and  its  mean  time-to-f allure  (MTTF).  An 
emphasis  is  placed  on  the  comparison  and  evaluation  of  various  estima¬ 
tors  and  on  determining  the  estimator  accuracy.  The  estimation 
problem  is  described  below. 

V 

1.1  The  Estimation  Problem 

Consider  a  software  package  being  tested  to  detect  program  errors 

Let  the  testing  start  at  t-0  and  denote  the  error  detection  times  by 

t, ,  t~»  ...  t  .  Also  define  the  "inter-detection"  times,  x.»  the  time 
l  ^  n  l 

intervals  between  the  detection  of  errors,  as 

Ci  “  Ci-1  i-2,3,  ...  n 

*,-{  (1-1) 

Ci  1“1 

The  correction  of  errors  can  be  done  in  two  possible  ways.  The 
first  possibility  is  to  have  all  errors  corrected  immediately  after 
detection.  An  equivalent  method  will  be  to  correct  the  errors  at  any 
time  after  discovery  but  not  to  count  rediscoveries  of  those  errors 
as  new  ones.  The  second  correction  method  accumulates  the  detected 
errors  and  at  some  discrete  times,  t^,  it  corrects  several  errors  n^. 
The  first  method  is  called  the  Instant  Correction  method  and  it  is 
discussed  in  Sections  2  and  3.  The  second  method  is  called  the 
Delayed  Correction  method  and  it  is  discussed  in  Sections  4  and  5. 

It  is  assumed  that  the  program  size  remains  the  same  during  the 
test  phase.  Furthermore,  since  there  is  no  reliable  information 
about  the  number  of  errors  which  are  introduced  during  the  corrections 
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It  is  assumed  that  Che  number  of  the  new  errors  Is  small  and  therefore, 

it  is  negligible.  Then  the  objective  is  to  estimate  N,  the  initial 

number  of  errors  in  the  program,  and  T,  the  WTTF  after  detecting  n  errors, 

from  the  sequence  x,  ,  x_,  x  . 

i  t  n 

First  note  that  during  the  interval  t^_^  <  t  <  t^  the  number  of 
errors  in  the  program  is  constant.  Therefore,  it  is  reasonable  to  assume 
that  the  error  detection  rate  will  be  constant  too.  The  error  detection 
rate,  which  is  also  called  the  hazard  rate,  is  denoted  by  z^.  This 
assumption  was  made  in  all  the  reported  models,  except  for  the  one  by 
Schick  and  Wolverton  [8],  where  it  was  assumed  that  the  hazard  increases 
linearly  with  time.  Since  we  could  not  find  a  physical  justification  for 
this  model  in  our  case,  we  did  not  use  it. 

Once  it  is  agreed  that  the  hazard  rate  during  t^_^  <  t  <  t^  is  a 
constant  and  equals  z^  the  probability  density  function  for  x^  can 
be  derived  [11];  it  is  found  to  be  exponential. 

f(xx)  -  z±  e  “Vi  (1-2) 

The  mean  value  of  x  ,  which  is  denoted  by  T^,  is  actually  the  MTTF 
before  the  detection  of  the  iC^  error.  It  is  given  by 


After  some  errors  are  corrected,  the  hazard  function  will  vary.  The 
two  main  models  of  Shooman  [4]  and  Jelinski  and  Moranda  [1] 
assume  that  the  hazard  function  is  proportional  to  the  number  of 
remaining  errors.  Therefore,  the  two  models  are  equivalent  for  our 

A 

case.  We  prefer  to  use  the  formulation  of  Jelinski  as  it  gives  H 
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directly.  The  hazard  function  Is  therefore  assumed  to  be: 

zi  -  <J>  (N-i+1)  (1-4) 

Where  (N-l+1)  Is  the  number  of  remaining  errors  during  the  time 

t^  ^  <  t  <  t^  and  ^  Is  a  positive  constant.  This  model  Is  called  the 

Standard  model. 

A  different  relationship  between  and  1  was  suggested  by 
Jellnsld  and  Moranda.  It  Is  given  by  (1-5): 


Both  and  a  are  positive  constants.  This  model  assumes  that 
the  hazard  function  changes  by  a  constant  ratio  and  therefore,  it  is 
called  the  Geometric  model. 

Another  model  was  developed  by  Musa  [7].  This  model  approximates 
the  number  of  errors,  which  is  an  Integer,  by  a  continuous  real  number. 
Based  on  that,  he  found  the  expected  number  of  errors  to  vary 
exponentially,  and  the  mean  value  of  t^  is  given  by 

‘i  '  ^  ln  (1 ' 5>  (1-6> 

This  model  is  referred  to  as  the  exponential  model. 

The  next  step  is  to  select  the  data  to  be  used  for  estimation. 
While  all  the  previous  studies  selected  x^  as  the  data,  it  was  found 
that  the  sequence  t^  may  give  better  results.  The  reason  for  this  is 
that  the  t's  are  the  integrals  of  the  x's,  and  therefore  will  fluxuate 
less.  We  have  used  both  the  x^  and  the  t^  data  for  estimation  with 
each  estimator  being  designated  as  the  x  type  or  the  t  type. 

Finally,  when  the  model  and  the  data  are  selected,  one  can  still 


select  different  types  of  estimators.  The  most  common  type  Is  the 
Maximum-Likelihood  (ML)  one.  This  was  used  In  all  the  reported  studies. 
Another  possible  approach  is  to  select  the  model  parameters,  N  and 
for  example,  in  such  a  way  that  the  difference  between  the  data  points 
and  their  mean  values  is  minimized  in  a  least  square  sense.  For 
example,  if  we  have  data,  and  we  are  using  the  standard  model,  we 
can  find  the  mean  value  of  x^,  which  is  denoted  by  T^,  from  (1-3)  and 
(1-4). 


Ti  “  <KN-i+l) 


and  defines  the  estimation  error  E  by  (1-8) 


(1-7) 


if1  (xi  "  V  "  if1  (xi  '  ^(N-i+1)3 


(1-8) 


This  estimation  method  selects  N  and  4>  which  minimize  E.  This  type 
of  estimator  is  called  a  least-square  (LS)  estimator.  Note  that  it  can 
be  used  with  x  or  t  type,  as  well  as  with  the  Standard,  Geometric  or 
the  Exponential  models. 

Next  we  may  select  any  combination  of  models,  data  type  and 
estimation  methods.  However,  some  combinations  lead  to  complex 
analyses;  therefore,  they  are  not  used.  The  seven  combinations 
which  were  selected  are  illustrated  in  Fig.  1-1.  The  resulting  esti¬ 
mators  for  the  Instant  Correction  methods  are  described  in  Section  2 
end  their  results  are  evaluated  in  Section  3.  Similar  estimators 
were  developed  for  the  delayed  correction  method.  It  was  found  that 
the  exponential  model  cannot  be  modified  for  this  case  and  therefore. 


it  was  not  used  here.  The  other  six  estimators  are  described  in 


Section  4  and  their  results  are  evaluated  in  Section  5. 


In  addition  to  determining  the  estimates,  we  wish  to  learn 
more  about  their  accuracy.  This  is  done  in  two  methods.  The  first 
method  is  the  development  of  confidence  intervals  for  the  estimates. 
The  second  method  is  to  study  the  effects  of  N  and  n  on  the  accuracy 
of  the  estimate.  The  two  methods,  along  with  some  experimental 
results  are  given  in  Section  6. 

Finally,  the  format  for  the  data  collection  is  important  as  it 
describes  the  information  to  be  gathered.  This  is  especially  impor¬ 
tant  in  the  case  of  delayed  error  correction,  where  the  detected 
errors  should  be  grouped  into  several  intervals.  Furthermore,  the 
source  of  each  error  should  be  discovered,  so  that  errors  would  not 
be  counted  more  than  once,  if  the  reliability  models  of  Section  2 
are  to  be  used.  In  order  to  make  sure  that  all  the  required  data  are 
collected,  a  proposed  format  was  devised  for  data  gathering.  This 
is  described  in  Appendix  E. 
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2.0  Reliability  Model  and  Estimators 

Several  methods  are  suggested  for  the  estimation  of  reliability 
parameters.  These  are  listed  below  along  with  their  equations. 

2.1  Maxi  ««»■»»-  Likelihood  Model 

Here  it  is  assumed  that  the  initial  number  of  errors  is  N. 

Errors  are  detected  at  random  and  are  corrected  immediately.  After 
correcting  the  (i-l)C^  error,  the  number  of  remaining  errors  in  the 
program  is  (N-i+1)  and  the  hazard  rate  is  assumed  to  be  proportional 
to  the  number  of  the  remaining  errors.  Thus,  the  hazard  rate  before  de¬ 
tecting  the  iC^  error  is 


zt  “  4> (N-i+1)  (2-1) 

Assuming  that  n  errors  were  detected  and  that  the  detection  times  are 

t. ,  t,  .  .  .  t  .  Define  the  time  differences  as 
i  ^  n 


“  ci- 


'i-l 


-  t. 


i  -  2,3  .  .  .  n 


(2-2) 


Then  it  is  possible  to  estimate  N  and  $  by  maximum  likelihood  estimators 

A  A  A 

N  and  $  .  The  derivation,  given  in  Appendix  A.  1  shows  that  N  is  the 
solution  of 


n  1 
E 

i-1  N-i+1 


-  -  0 


l  (i-1)  x. 


and 


i-1  N-i+1 


n 

E  XJ 
i-1 


n 

E  x 

i-1 


(2-3) 


(2-4) 
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The  MITF  after  the  (i-l)th  error  correction  is 


Ti“ 


$(N-i+l) 


(2-5) 


2.2  Geometric  Maximum  Likelihood  Model 


In  this  model,  the  hazard  rate  decreases  by  a  constant  ratio 
after  the  correction  of  an  error.  Accordingly,  the  hazard  function  be¬ 
fore  the  detection  of  the  ith  error  is 


z.  -  X  a 
i  o 


(2-6) 


where  X  and  a  are  constants, 
o 

Here  one  can  estimate  the  most  likely  values  of  X  ,  a  and  the  mean 

o 

time  to  failure.  This  is  done  in  Appendix  A* 2.  The  result  of  the 
analysis  shows  that  the  most  likely  estimtes  are  the  solutions  of 
(2-7)  and  (2-8) 


n  t  <  1-1 

n  1  i  a  x. 


n(n+l)  _  i°l 
2a  n 


(2-7) 


a  x. 


i-1 


^o  n 


(2-8) 


r  i 
lax. 


i-1 


The  mean  time  to  failure  after  the  correction  of  the  (i-l)*"*1  error  is 


i  -  -i-  -_L_ 

1  *i  X  a1 


(2-9) 
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2.3  Least  Square  x  Model 

The  basic  assumptions  of  this  model  are  the  same  . as  those  of  the 
model  in  Section  2.1.  However,  instead  of  determining  the  most  likely 
estimates  of  N  and  <J>;  we  search  for  those  values  which  minimize  the 
sum  of  the  squares  of  the  deviations  of  x^  from  the  mean  values. 


G 


n 

Z 

i-1 


txi 


<(>(N-i+l) 


1  l2 


(2-10) 


N  and  <{>  which  minimize  E,  as  given  by  (2-10),  are  derived  in  A.  3.  It  is 
found  that  N  is  the  solution  of  the  equation  (2-11). 


n 

Z 

i-1 


(N-i+1)2 


n  n  x  n 

•  Z  - i - -  -  Z  - - -  •  Z  - - - r  =  0  (2-11) 

!»].  (N-i+1)  i-].  (N-i+1)  i-l  (N-i+1  )J 


The  estimate  for  <}>  is 

1 


n 

Z 


♦ 


i-1  (N-i+1)  • 


i-1 


N-i+1 


The  MTTF,  after  the  (i-l)th  error  correction  is 


1 


J (N-i+1) 


(2-12) 


(2-13) 


2  -4 


2.4  Least  Square  t  Modal 

This  modal  differs  from  the  previous  one  by  the  fact  that  we 
operate  on  the  t  times  rather  than  the  x  times.  Note  tiuit  the  tiroes 
t^  are  given  by 


t1  -  E  x 
J-l  J 


(2-14) 


The  rationale  for  the  selection  of  t^  as  a  quantity  to  fit  is  that 
it  may  be  less  sensitive  to  random  changes,  due  to  the  summation  of  x^. 
We  compare  ^  to  its  expected  value,  t^,  which  is  given  by 


-  £  x,  -  l 


1  j-i  J  j-i 
Consequently,  the  sum  of  the  squares  of  the  deviations  is 


(2-15) 


s  W'l'A  *wW‘ 

A  A 

The  objective  la  to  determine  N  and  «$>  which  minimise  (2-16).  This  is  done 

A 

in  Appendix  A*4  where  it  is  shown  that  N  is  the  solution  of  (2-17). 


Y.  ttBt  E  At‘ 


E  tlA1  E  A  B  -  0 


(2-l7a) 


i-1 


1-1 


l-l 


i-1 


where 


Ai  "  £ 


J-l  N-Jfl 


(2-1 7b) 


2-5 


and 


B.  ■  E  i 

1  j-l  (N-j+l)2 


(2-17c) 


<p,  is  determined  from 


<P  " 


Z  As 
i-1  1 


E  t  A 
i-1  1  1 


(2-18) 


The  mean  time  to  failure,  T,  Is  given  by 


T.  -  - 

i  A  A 


(2-19) 


<f>(N-i+l) 


2.5  Geometric  Least  Square  x  Model 

Consider  the  geometric  model  where  the  hazard  function  after  the 
correction  of  the  (i-l)tl*  error  is 


z  -  X  a 
1  o 


(2-20) 


th 

The  mean  time  between  the  detection  of  the  (i-l)c  and  the  i  error  is 


1  zi  V 


(2-21) 


The  objective  now  is  to  select  the  parameters  a  and  \q  such  that  the 
sum  of  the  squares  of  deviations  (x^  -  x^)  is  minimized.  Thus,  the 
estimation  error  is 


n 


E  -  E  (x  -  x.)2  -  E  (x. - — r)2 


i-1 


i-1 


X  a 
o 


(2-22) 


«■■*!«, I  ,  I,  >■ 
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The  parameters  X^  and  a  which  minimize  E  are  derived  in  Section  A. 5.  It 

A 

ia  found  that  a  is  the  solution  of  (2-23),  and  X  is  obtained  from 

o 

(2-24). 


n  4 

i  x. 

n 

n 

x. 

n 

* 

E 

1 

21 

E 

i 

i 

E 

1-1 

i-1 

n 

r 

a 

1 

i-1 

a 

i-1 

21 


-  0 


A  7 


(2-23) 


(2-24) 


th  * 

The  mean  time  to  failure  for  the  i  error,  T^,  is  given  by 


T*  "  x  a1 


(2-25) 


2.6  Geometric  Least  Square  t  Model 

The  geometric  least  square  model  is  applied  to  the  cumulative 
time  to  failure  t^.  Consequently,  the  estimation  error,  E,  is 


-  2 

E  -  E  (t.  -  t.T 
i-1  1 


(2-26) 


where 


t  -  E  x 
i  .  1 

J-i  J 


(2-27) 


and 


1  -  1  1 
Ci  "  E  Xi  "  Z  , 

j-1  J  J-l  XQaJ 


(2-28) 


Therefore,  E  may  be  written  as 


n  i  ,  2 

E  -  E  (t  -  E  -i-  ) 

i-1  J-l  AoaJ 


(2-29) 


The  parameters,  a  and  which  minimize  E  are  derived  in  Section  A. 6. 

/V 

a  is  the  solution  of  eq.  (2-30) 


E  tjPi  E  Cj 
i-1  i-1 


-  E  tiCi  E  CiD±  -  0 
i-1  i-1 


where 


(2-30a) 


ct  -  z  -4 

1  j-i  ai 


(2- 30b) 


d4  -  e  4 

j-l  aJ 


(2- 30c) 


XQ  is  found  from: 


E  C± 

X  - 

o  n 

E  tiCi 

i-1 


(2-31) 


and  T^,  is  given  by 


T±  "  X  2i 

o 


(2-32) 


2.7  Exponential  Least  Square  Model 

The  following  model  is  based  on  the  work  reported  by  Musa  [7], 

where  a  continuous  model  is  assumed  for  N.  According  to  that 

model,  the  expected  number  for  the  corrected  errors,  N  ,  is 

c 
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Nc  -  Nd-e"^)  (2-33) 

where  <p  is  a  constant  and  N  is  the  Initial  number  of  the  errors.  Also 
the  expected  time  until  the  iC^  error  is  detected,  t^,  is  found  to  be 

'  T  ln  (1  '  K>  (2-M) 

A  /\ 

Accordingly,  we  seek  the  parameters  $  and  N  which  minimize  the  estimation 
error  E. 

E  ■  j,  <‘l  -  V*  '  t‘i  +  i  1“  O  -  i)  l2  (2-35) 

It  is  shown  in  Appendix  A  that  the  best  estimate  for  N  is  the  solution 
of  (2-36) 


=  p1  2  in2  <tt)  -  1  ‘t  2»  <TT>  1  FT  *»  ■  0  (2-36> 

i-1  i-1  i-1  i-1 


The  estimate  for  ip  is  found  from 
n 


<P  " 


r  ,  2  /  H  v 
E  1x1 

i-1 

n  N 

£  ci ln 

The  MTTF  before  the  i  error,  T^,  is  given  by 


(2-37) 


Ti  “  Ci  “  Ci-1 


(2-38) 


where  t^  is  given  by  (2-34).  This  is  found  to  be 


A 


1 


In 


N-i+1 


A 


N-l 


(2-39) 
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3.0  Test  and  Evaluation  of  Estimators 

The  various  estimators  which  are  described  in  Section  2  were 
tested  and  evaluated  in  order  to  verify  that  the  equations  are  correct 
and  have  no  errors.  Also,  we  want  to  determine  the  quality  of  the 
estimators  from  the  points  of  view  of  convergence  and  accuracy. 

The  first  task  is  relatively  easy.  It  was  done  by  testing  the 
estimators  with  deterministic  data  rather  than  random.  In  other  words, 
instead  of  having  a  sequence  of  random  numbers,  x^,  to  analyze,  we 
generate  a  sequence  of  the  expected  values  of  x^  for  the  parameters 
N-60,  n-50,  <J>  -  0.1,  with  the  corresponding  MTTF  of  T  -  1.0.  Since 
the  data  is  not  random,  all  the  estimates  should  estimate  the  parameters 
N  and  T  exactly.  This  was  finally  achieved  after  correcting  several 
errors  in  the  program.  This  method  was  found  very  useful  in  debugging 
the  estimators  program. 

The  next  task  is  more  difficult  as  real  data  is  not  available 
for  testing.  The  next  best  thing  to  real  data  is  a  randomly  generated 
data  with  the  desired  exponential  probability  function.  Thus,  the 
data  was  generated  randomly  with  exponential  probability  density 
function, 

f(x±)  -  <J>(N-i+l)  e-<f,<N"1+1)xi  (3-1) 

where  the  index  1,  was  adjusted  for  each  simulated  time.  Another  point 
of  significance  in  simulating  the  data  was  to  verify  that  the  generated 
time  intervals,  x^,  are  independent  of  each  other.  This  was  examined 
by  defining  the  variable  y^  by  (3-2) . 

yA  .  ^(N-i+Dxi 


(3-2) 
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The  resulting  random  variable,  y^,  is  uniformly  distributed  in  the 
interval  (0,1).  In  order  to  examine  the  dependence  between  the  various 
y^'s,  we  evaluated  the  correlation  function  R(k). 


R(k) 


1 

n-k 


n-k 

E 

i-1 


(2-3) 


It  was  found  that  R(0)  equals  0.08,  as  expected,  but  all  the  quantities 

R  (k),  for  k  between  1  and  10  were  close  to  zero,  indicating  that  the 

y^  values,  and  therefore,  the  x^  values  too,  are  independent. 

In  order  to  make  the  test  statistically  significant,  1000  random 

sequences  were  generated  for  each  estimator  and  the  parameters  were 

estimated.  The  test  results  are  presented  as  histograms  which  show  the 

frequency  of  estimating  a  certain  parameter. 

Following  are  four  histograms  for  estimating  N,  the  initial  number 

of  errors.  The  data  was  generated  randomly  on  the  basis  of  the  parameters 

N-60  and  <{>*0.02;  the  right  estimate  for  N  will  be  60.  However,  due  to 

the  randomness  of  the  data,  the  estimates  are  spread  over  a  wide  range. 

/\ 

The  histograms  indicate  the  frequency  of  estimating  N.  Also  included 
is  the  cumulative  frequency  (C.D.F.),  which  indicates  the  number  of 

A 

estimates  being  less  than  or  equal  to  a  certain  value  of  N.  Note  that 
all  the  histograms  are  similar  in  shape,  indicating  that  all  the  four 
estimators  are  similar  to  their  behavior. 

Comparing  the  histograms  we  observe  that  their  general  shape  is  the 
same.  The  median  point,  for  which  50%  of  the  estimates  fall  below  it, 
is  60  and  61  for  the  estimates.  The  convergence  rate  is  very  high  in 
all  the  four  models.  It  varied  between  996  and  999  converging  samples, 
out  of  1000. 
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Another  feature  of  interest  is  the  estimate  of  the  mean  time  to 
failure,  T.  This  parameter  was  estimated  by  all  the  seven  methods.  The 
estimators  operated  on  random  data  vhich  was  generated  for  the  parameters 
N»60,  n*50,  and  $  ■  0.1.  The  MTTF  after  detecting  the  50  errors  should 
be  1.0,  however,  the  estimate  varies  due  to  the  randomness  of  the  data. 

A  histogram  for  the  geometric  maximum  likelihood  model  is  given  below. 

In  this  case  all  the  estimates  converged  between  the  values  of  0.25  and 
1.75. 

The  convergence  was  not  always  so  good  especially  in  the  models 
which  estimated  N  first.  A  summary  of  the  results  is  given  in  Table  3.1. 
The  table  contains  the  number  of  samples  for  which  the  estimate  has 

A 

converged.  Also,  it  gives  the  values  of  T,  the  KTTF,  for  various  per¬ 
centiles  of  the  estimates.  For  example,  the  first  row  indicates  that 

A 

25 J  of  the  estimates  of  T,  using  the  maximum  likelihood  method,  were 
below  0.70.  The  results  of  Table  3.1  allow  us  to  conclude  the 
following: 

a)  The  estimates  of  T  which  are  based  on  the  Geometric  models  are 
better.  The  variance  of  the  estimate  is  smaller  than  that 
resulting  from  the  standard  or  the  exponential  models.  The 
reason  for  this  is  that  the  geometric  model  does  not  require 
the  estimate  of  N,  which  is  very  sensitive  to  random 
variations  in  the  data. 

b)  The  least  square  estimates  which  are  based  on  the  x  data  are  al¬ 
ways  worse  than  those  based  on  the  t data.  The  reason  for  this  is 
that  the  t  times  are  the  summation  of  the  x's,  and  therefore, 
they  "smooth"  out  the  randomness  of  x. 


XXX  X 


Table  3.1  Summary  of  results  of  MTTF  estimates 
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c)  We  note  that  the  general  spread  of  the  estimates  of  T  Is  much 

A  A 

smaller  than  that  of  N.  The  reason  Is  that  T  is  derived  from 

A 

the  detection  rate,  whereas  N  has  to  be  found  from  the  change 
in  the  error  detection  rate  and  a  change  is  more  sensitive  to 
randomness,  the  way  that  the  derivative  function  is  more  sen¬ 
sitive  to  noise. 

Another  question  which  interest  us  is  the  correlation  between 
estimators.  That  is,  if  one  estimator  produces  a  large  estimate  by  one 
method,  would  the  same  set  of  data  produce  large  estimates  using  the 
other  methods?  In  order  to  examine  this,  we  have  determined  all  the 
seven  estimators  for  20  sets  of  data.  Note  that  although  the  data  was 
generated  randomly,  the  same  sets  of  data  were  applied  to  all  the  estl- 

A  A 

mators.  The  results  were  the  estimates  of  N  and  T,  by  the  Standard 
and  the  Exponential  models,  and  estimates  for  a  and  T  by  the  Geometric 
models.  All  the  data  points  were  generated  for  the  parameters  N=60, 
n*50  and  <f>“0.1,  with  the  resulting  T»1.0.  Thus,  the  correct  values  are 
N*60  and  T=1 .0 .  The  actual  estimates  for  all  the  20  samples  are  given 
in  Table  3.2. 

An  examination  of  the  results  of  Table  3.2  reveal  several 
interesting  points: 

a)  There  is  a  strong  correlation  between  the  estimators.  When 
a  set  of  data  produces  a  small  estimate  of  N,  it  does  it 
with  all  the  estimators  (samples  1,  3,  7,  9,  15).  Similarly, 
when  the  estimates  are  high,  they  are  high  with  all  the 
estimators,  (samples  5,  10,  12,  13). 

b)  Whenever  the  estimate  of  N  is  high,  the  estimate  of  T  is 


Tabla  3.2  Comparison  batvaan  sat last  ora  with  Idantical  data  saaplaa 


N-bO  n-30  ^0.1  T-l.O 


3 

low  and  vice  versa,  low  estimates  of  N  give  high  estimates 
of  T. 

c)  The  three  methods  of  the  Maximum-Likelihood,  the  Least- 
Square  t  and  the  Exponential  models  give  almost  identical 
estimates  whereas  the  Least  Square  x  model  gives  different 
estimates. 

d)  The  estimates  for  T,  resulting  from  the  Geometric  estimators 
are  usually  lower  than  those  given  by  the  Standard  estima¬ 
tors.  The  reason  for  this  is  that  the  data  is  generated 
randomly  according  to  to  the  Standard  model.  When  we  try 

to  fit  a  Geometric  model  to  it,  we  obtain  a  lower 
estimate. 

e)  In  spite  of  the  high  corrleation  between  the  estimators, 
it  is  worthwhile  to  evaluate  all  of  them,  as  this  gives  a 
wider  base  for  estimating  N  and  T. 
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4.0  Reliability  Models  for  Delayed  Error  Correct  ton 

The  objective  of  this  section  Is  to  modify  the  reliability  models 
and  estimators  of  Section  2  to  fit  the  situation  where  error  correction 
is  delayed.  The  program  is  loaded  on  a  tape  and  each  tape  version  is 
tested  and  corrected.  While  the  program  is  being  tested,  errors  are 
detected  and  recorded.  Some  of  these  errors,  along  with  soma  other  errors 
which  were  detected  by  other  means,  are  corrected  at  the  end  of  the  test 
period.  The  corrections  appear  on  the  newer  tape  version. 

Define  the  following  variables; 

t^  -  time  when  the  i**1  error  Is  detected.  This  Is  the 
execution  time  and  not  the  calendar  time. 

x^  ■  t^  -  t^_^  “  time  between  the  detection  of  the  (1-1)^ 
error  and  the  1th  error. 

k  -  number  of  tape  versions. 

Oj  -  number  of  errors  that  were  found  in  the  jth  tape 
version. 

-  number  of  errors  which  were  corrected  in  the  (j+1) 
tape  but  not  in  the  J  tape. 

Mj  ■  nii  +  +  .  .  .  ”  Cumulative  number  of  errors  which 

were  corrected  in  the  j  tape  version. 

N  -  Initial  number  of  errors. 


Nj  "  N  ~  Mj  "  N  -  “i  - 


.  .  -  number  of  errors  that 


remain  in  the  j  tape. 

1^  -  the  set  of  integers  which  Includes  the  indices  of  the 
errors  found  in  the  jtl'  tape. 

For  example,  suppose  that  the  program  used  two  tape  versions. 


Four  errors  were  found  in  the  first  tape,  of  which  three  were 
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corrected  before  the  second  tape  was  Introduced.  Three  more  errors 
were  found  in  the  second  tape.  The  number  of  tapes  here  is  k-2. 

The  number  of  errors  found  in  n^»4  and  ^“3.  The  number  of  errors 
which  were  corrected  in  m^-3.  Therefore  we  have  M^«0  and  >^“3. 

The  set  1^  includes  the  numbers  [1,2,3, 4]  and  includes 
[5,6,7].  Based  on  the  above  notations,  we  modify  the  models  of 
Section  2  as  follows. 


4.1  Maximum  Likelihood  Model 

This  model  can  be  easily  modified  to  this  case  as  the  hazard 


function  is  assumed  to  be  proportional  to  the  number  of  remaining 


errors.  Let  the  number  of  errors  in  the  j  tape  be  N ^ ,  then  we  have 


Accordingly,  the  hazard  function  for  the  jth  tape  is 

(4-3) 

The  modified  model  is  derived  in  Appendix  B.  N,  the  most  likely 
estimate  for  N,  is  the  solution  of  the  equation 
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and  is  derived  from 
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and  A  is  found  from 
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E  aJ  Z  x 
J-l  iCIj 

Tha  MTTF  for  the  tape  is 
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(4-10) 


(4-11) 


(4-12) 
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For  model  II  the  estinate  a  is  found  from 
k 

n  E  (M  aMJ  E  x  ) 

l-i  ieIi  v  M.  r 

_ _  .1 _ EaJE  x.  -  0 
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j-i  J  J 


and  A is  determined  from 
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A  -  — 
o  k 


E  (a^J  E  x  ) 
J-l  ielj 


(4-13) 


(4-14) 
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It  can  be  seen  that  model  I  can  be  obtained  as  a  special  case  of 
model  II  by  substituting  -  J 


4.3  Least  Square  x  Model 


The  hazard  function  z^,  when  the  j  tape  is  used,  is 


Accordingly,  the  mean  value  of  x^  is 


(4-16) 


xi  ’  «T  "h*r<!  i^j- 


(4-17) 


The  estimation  error  to  bo  minimized  here  is 


e.  i  c^-v2-  i 

i-1  j-1  iel  J 


(4-13) 


The  parameters  N  and  which  minimize  this  quantity  are  derived  in 

A 

Appendix  B.  N  is  the  solution  of  the  equation 


Z  (-^T  1  xi) 

i-1  NJ‘  ieIJ 


E  (N~  E  xi> 

j-1  J  iel, 


(4-19) 


and  d>  is  found  from 
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The  MTTF  equals 


",  1 
A  4 

J  ♦H. 


(4-21) 


4.4  Least  Square  t  Model 

This  model  fits  the  cumulative  time,  t^,  to  its  expected  value. 
Since  the  hazard  function  is  constant  during  the  use  of  a  certain 
tape,  the  mean  time  between  failures  will  be  constatn  in  that  interval 
too. 


x.  -  rpr-  where  iel, 

i  Hj*  i 


(4-22) 


To  simplify  the  notation,  define 


N,  .  -  N.  where  me I, 

V®)  j  j 

Then  we  can  write 


(4-23) 


1  N(i)* 


(4-24) 


In  view  of  (4-23),  the  estimation  error,  E,  equals 


D  n  i  2 

e  -  z  (ti  -  li)2-  z  ui  -  z 


i-l 


i-1 


m-1  N(m)* 


(4-25) 


A  *2 

The  estimation  of  N  and  which  minimizes  (4-25)  is  given  in 
Appendix  B.  N  is  the  solution  of  (4-26) 
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where 
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Also,  $  is  given  by 
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The  MTTF,  T  is 


1 

A  A 

♦  Nj 


(4-27) 


(4-28) 


(4-29) 


(4-30) 


4.5  Geometric  Least  Square  x  Model 

Section  4.2  presented  two  forms  of  the  geometric  model.  Since 
it  was  shown  that  model  II  is  more  general,  it  will  be  considered 
in  this  section  and  in  the  following  one. 

Here  again  we  use  the  notation 


-  Mj  where  iel^ 
and  recall  (4-9) 


(4-31) 


Mj  "  nl  +  n2  +  *  *  *  “j-l 


(4-32) 


The  objective  of  the  model  is  to  estimate  a  and  \q  which  minimize  E. 
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a  n„ 

Z  (x.  -  x.)2  -  E  (x  - 


1  1  1  H(1) 

i-1  i-1  \  a  U; 


(4-33) 


The  estimator  a  is  found  in  Appendix  B  to  be  the  solution  of  (4-34) 
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-  0  (4-34) 


The  estimate  for  XQ  is  given  by  (4-35) 
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The  MTTF  is  given  by 
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(4-36) 


4.6  Geometric  Least  Square  t  Model 

The  error  resulting  from  fitting  the  t^  values  with  their  mean  is 
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The  estimate  a  for  a  is  found  from 
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5-0  Test  and  Evaluation  of  Estimators 
_ for  the  Delayed  Correction  Case 

The  modified  estimators,  given  in  Section  4  were  tested  and  eval¬ 
uated.  Here  again,  the  objective  is  to  verify  that  the  derivation  and 
the  computer  program  are  correct,  and  to  examine  the  quality  of  the 
estimators. 

The  first  part  was  done  by  testing  the  estimators  with  determ¬ 
inistic  data.  Instead  of  generating  a  sequence  of  random  numbers, 
x^,  we  generated  a  sequence  of  the  expected  values  x^  for  some  given 

A 

parameters  N=60,  T=1.0.  The  resulting  estimators  should  equal  N=60 

A 

and  T»1.0,  if  the  derivation  and  the  program  are  correct.  Indeed, 
after  some  small  corrections,  all  the  estimates  were  equal  to  the 
desired  values. 

The  next  task  of  examining  the  estimators  was  done  in  a 
similar  way  to  the  method  of  Section  3.  Random  sequences  of  times 
were  generated  to  simulate  the  error  detection  process  with  the 
parameters  N-120,  n-100  and  $=0.05,  with  a  resulting  MTTF  of  T=1.0. 

1000  such  sequences  were  generated,  and  the  various  estimators  were 
determined  for  them.  The  results  of  the  estimators  of  N  are  given 
in  the  following  pages  in  terms  of  histograms.  These  include  estimates 
of  N  determined  by  the  Maximum-Likelihood  estimator  and  by  the  Least- 
Square  estimators  for  both  x  and  t.  Note  that  the  "correct"  estimate 
is  N=120  and  this  value  is  the  median  for  the  three  histograms.  The 
histograms  have  the  same  general  shape  and  are  similar  to  those 


obtained  in  Section  3. 
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Another  quantity  of  interest  is  the  estimate  of  the  MTTF.  We 
have  estimated  the  MTTF  from  the  1000  sets  of  data,  using  all  six 
estimators.  The  results  are  presented  by  some  histograms  and  by  Table 
5.1.  The  following  page  shows  a  histogram  of  the  MTTF,  using  the 
Maximum  Likelihood  estimator.  Note  that  the  histogram  is  skewed  and 

A 

that  the  "correct"  value  of  T-1.0  is  the  median.  This  is  typical 

A  A 

for  the  Standard  estimators,  which  determine  both  N  and  T.  This  is 
followed  by  three  histograms  of  estimates  by  the  Geometric  models, 
and  the  change  is  significant.  Here  we  note  that  the  histogram  shape 
resembles  the  normal  distribution  curve  and  that  the  spread  is  much 
smaller  than  in  the  previous  case.  Here  again  we  observe  that  the 
mean  value  of  the  geometric  estimators  is  considerably  below  the 

A 

"correct"  value  of  T-1.0.  The  reason  for  this  is  that  the  data  was 
generated  according  to  the  Standard  model,  and  when  we  try  to  fit  a 

A 

Geometric  model  to  it,  we  end  with  a  smaller  value  of  T.  The  estimator 
results  are  analyzed  further  and  the  main  results  are  summarized  in 
Table  5.1.  These  include  the  number  of  samples  for  which  the  estimate 
has  converged,  various  percentiles  of  the  estimates  and  the  range  of 
the  estimators.  The  results  of  Table  5.1  are  similar  to  those  of 
Table  3.1  and  they  lead  to  the  conclusions  made  in  Section  3,  namely, 

A 

that  the  estimates  of  T  using  the  Geometric  model  are  generally  better 

A 

than  the  Standard  estimators,  and  that  the  LS  estimates  of  T,  based  on 
t  are  better  than  those  generated  from  the  x  data. 
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Table  5.1  Summary  o£  results  of  MTTF  estimates  for  piecewise  constant  hazard 
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Another  question  which  Interest  us  is  the  correlation  between 
estimators.  That  is,  if  one  estimator  produces  a  large  estimate  by  one 
method,  would  the  same  set  of  data  produce  large  estimates  using  the 
other  methods?  In  order  to  examine  this,  we  have  determined  all  the 
six  estimators  for  20  sets  of  data.  Note  that  although  the  data  were 
generated  randomly,  the  same  sets  of  data  were  applied  to  all  the 
estimators.  The  results  were  the  estimates  of  N  and  T,  by  the  Standard 

A  A 

models,  and  estimates  for  a  and  T  by  the  Geometric  models.  All  the  data 
points  were  generated  for  the  parameters  N*120,  n*100  and  $*0.05, 
with  the  resulting  T*1.0.  Thus,  the  correct  values  are  N»120  and 
T*1.0.  The  actual  estimates  for  all  the  20  samples  are  given  in  Table 
5.2 

An  examination  of  the  results  of  Table  5.2  leads  to  the  same 
conclusions  derived  from  Table  3.2.  These  are: 

a)  There  is  a  strong  correlation  between  the  estimators.  When 
a  set  of  data  produces  a  small  estimate  of  N,  it  does  it 
with  all  the  estimators  (samples  3,  7,  8,  15).  Similarly, 
when  the  estimates  are  high,  they  are  high  with  all  the 
estimators  (samples  9,  14,  18,  19). 

b)  Whenever  the  estimate  of  N  is  high,  the  estimate  of  T  is 
low  and  vice  versa,  low  estimates  of  N  give  high  estimates 
of  T. 

c)  The  methods  of  the  Maximum-Likelihood  and  the  Least-Square 
t  models  give  similar  estimates  whereas  the  Least  Square 

x  model  often  gives  different  estimates. 

d)  The  estimates  for  T,  resulting  from  the  Geometric  estimators 
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Table  5.2  Comparison  between  estimators  wich  identical  data  samples 


Sample 

Number 


Maximum 

Likelihood 


Least 
Square 
x  Model 


Least 
Square 
t  Mode  1 


Geometric 
Maximum 
Like lihood 


Coometric  Ceonetric 
LS  x  Model  LS  t  Model 


119  0.34  113 


0.36  115 


109 


119 


1.29  113 


2.00  140 


1.57  115 


1.59  115 


138  0.62  134 


116  1.13  113 


130  0.75  118 


0.68  130 


0.33  103 


124  j  0.77  130 


1.19  111 


0.93  119 


109  1.94  117 


138  0.77  142 


128  0.S1  127 


123  1.14  116 


1.11  0.985  0.529  0.984  0.567  0.9S4  0. 56  7 


1.11 

0.985 

1.07 

0.936 

2.01 

0.983 

1.04 

0.983 

1.040.983  0.693  0.937  0.603  0.986  0.649 


0.95  0.985  0.581  0.933  0.660  0.9S6  0.536 


0.9S3  0.749  0.990  0.491 


1.16  0.935  0.627  0.982  0.757  0.9S6  0.604 


.32  0.934 


0.933  0.315  0.936  0.703 


0.66  0.986  0.584  0.9SS  0.527  0.98S  0.525 

0.986  0.556  0.983  0.679  01985  0.6C3 


1.05  0.935  0.632  0.9S7  0.596  0.935  0.650 

- , - 

0.61  0.986  0.477  I  0.9S6  0.481  0.98S 


1.90  0.987  |  0.567  I  0.9S6  0.614  I  0.983  !  0.70S 


0.67  0.9S9  0.453  0.937  0.536  0.939  0.460 
1.600.935  0.633  j  0.9S3  0.723  0.934  O.o95 

- - _| - - -j- - 

0.99  0.986  I  0.577  i  0.984  !  0.665 


1.23  0.984  0.773  0.932  0.861  0.986  0.574 


0.990  0.591  0.939  0.613  0.990  0.564 


0.35  0.988  0.561  j  0.937  j  0.611  0.987  0.593 


1.470.983  0.933 j  0.934  0.362  I  0.935  0.32S 


120.3  |  1.09  120.5  1.12  0.9857  0.622  I  0.934“  0.66 


0.41  9.9  0.40  0.0019  0.117  0.0021  0.112  0.0083 
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are  usually  lower  than  those  given  by  the  Standard  estimators. 
The  reason  for  this  is  that  the  data  is  generated  randomly 
according  to  the  Standard  model.  When  we  try  to  fit  a 
Geometric  model  to  it,  we  obtain  a  lower  estimate, 
e)  In  spite  of  the  high  correlation  between  the  estimators, 
it  is  worthwhile  to  evaluate  all  of  them,  as  this  gives  a 
wider  base  for  estimating  N  and  T. 
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6.0  Accuracy  of  Estimates 

/N  A 

In  addition  to  estimating  the  parameters  N  and  T,  we  wish  to 
determine  the  accuracy  of  the  results.  This  is  done  by  two  different 
methods.  The  first  method  is  the  development  of  confidence  intervals 
for  the  estimators,  and  the  second  method  is  the  examination  of  the 
effect  of  N,  the  initial  number  of  errors,  and  n,  the  number  of 
detected  errors,  on  the  accuracy  of  the  estimators. 

In  order  to  simply  the  analysis,  we  discuss  only  the  Instant 
Correction  case,  where  errors  are  corrected  immediately  after 
detection. 

Two  methods  can  be  used  for  the  development  of  confidence 
intervals.  The  first  one  is  based  on  the  fact  that  maximum  likelihood 
estimators  which  are  based  on  large  samples  of  data  are  normally  dis¬ 
tributed,  with  the  true  value  as  a  mean.  The  resulting  confidence 
interval  is  called  "large  sample  confidence  interval."  The  second 
method  for  constructing  confidence  intervals  is  a  general  one  and 
does  not  rely  on  the  assumption  of  normal  distribution.  The  two 
methods  are  described  in  [10].  For  the  purpose  of  completeness,  we 
present  the  two  methods  in  Sections  6.1  and  6.2. 

The  second  method  of  evaluating  the  effect  of  N  and  n  on  the 
accuracy  of  the  estimator  is  given  in  Section  6.3. 

6.1  Large  Sample  Confidence  Intervals 

This  method  is  based  on  the  assumption  that  the  sample  size 
is  large  enough  to  result  in  normally  distributed  estimators 

A  A 

N  and  T.  This  assumption  is  accepted  by  researchers  [12]  for  samples 


6-2 


of  size  n  >  30 . 

The  method  Involves  two  steps:  the  evaluation  of  the  variance, 
and  the  construction  of  the  confidence  interval.  The  variance  can 
be  calculated  by  the  method  which  is  giver,  in  Appendix  C.  It  is 

-*s 

found  there  that  the  variance  of  N  is: 


Var  (N) 


n 

2  '"2 

Sn  -  A  $ 


(6-1) 


where 


n 

i-1 


and 


n 

S  -  Z 


i-i  (i-i+ir 

Similarly,  the  variance  of  T  is  found  to  be: 


(T)  (S  -  +  -^-3^  > 


Var 


(N-n)  (N-n)  T 


where  S  is  given  by  (6-3)  and 


B  -  £  (n-i+1)  x. 

i-1 


Also 


n  2B 

A  -  (S-  -  ---  + 


-n  2A  2B  ,  B 

:)  (—  +  TT  +  “  - 


.  O  '  />  A  ys/  A  ~  A  t  *  4  '4 

(N-n) *"  (N-n)J  T  T“  T3  (N-n)  T  (N-n)  T 

Once,  the  variance  is  determined  the  confidence  interval  Is 
given  by 

N  +  X  A'ar(N) 

V  2  ' 


(6-2) 


(6-3) 


(6-4) 


(6-5) 


(6-6) 


(6-7) 
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6.2  General  Confidence  Intervals 

The  method  used  in  the  preceding  section  is  based  on  the  assump¬ 
tion  that  a  large  number  of  errors  were  corrected.  Here  we  present 
a  general  method  which  does  not  require  large  samples  of  errors.  The 
method,  which  is  given  by  [10],  is  described  in  Appendix  D.  We 
present  here  a  brief  description  of  the  method. 

•A 

Suppose  that  we  estimate  the  number  of  errors  to  be  N' ,  and  that 
this  is  based  on  data  from  n'  points.  Our  objective  is  to  construct  a 
confidence  interval.  For  definiteness  let  the  desired  confidence 
level  be  90  percent.  The  method  is  based  on  finding  two  numbers,  and 
N2*  is  such  a  number  that  when  the  true  number  of  errors  is  N^, 

A 

5  percent  of  the  estimates  are  below  N' .  Similarly,  N2  is  such  that 

A 

when  N  is  equal  to  N2*  5  percent  of  the  estimates  are  above  N'.  These 
values  form  a  confidence  interval  (N  N j)  as  discussed  in  Appendix  D. 
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The  detailed  procedure  for  using  this  method  is  illustrated  next 
for  the  cases  n'  ■  60  and  n'  ■  100.  The  first  step  is  to  construct 
the  percentile  curves,  as  shown  in  Fig.  6.1  and  6.2  The  point  A,  on 
Fig.  6.1,  indicates  that  if  the  initial  number  of  errors  is  N*90,  and 
n'  •  60  of  those  errors  were  corrected,  then  10%  of  the  estimates, 

A 

N,  will  be  below  73  and  90%  will  be  above  it.  Similarly,  point  B 
shows  that  75%  of  the  estimates  in  this  case  are  above  79.  One  can 
conclude  that  15%  of  the  estimates  will  be  within  73  and  79. 

In  order  to  determine  the  points  A  and  B  we  start  with  the  initial 
number  of  errors  N«90.  We  can  choose  $  to  be  any  positive  constant, 
as  it  was  shown  in  Appendix  D  that  $  does  not  influence  the  confidence 
interval.  Next  simulate  the  error  detection  process  and  generate 
n'  “  60  Interdetection  times  x^,  according  to  the  probability  density 

A 

function  (1-2).  Based  on  this  sequence,  we  estimate  N  and  record  its 

A 

value.  This  process  of  generating  a  sequence  and  estimating  N  is 
repeated  1000  times  and  the  resulting  estimates  N  are  represented  by 
a  histogram.  It  was  found  from  the  histogram  that  10%  of  the  estimates 
N  are  below  73,  this  is  the  basis  for  the  construction  of  the  point  A. 

Next,  we  change  the  initial  number  of  errors  to  N«80,  and 
repeat  the  process.  When  we  obtain  enough  percentile  points,  we 
can  Join  them  to  form  the  curves  of  Fig.  6.1  and  Fig.  6.2. 

The  percentile  curves,  along  with  the  estimate  N,  allow  us  to 
construct  a  confidence  Interval  of  any  desired  level.  Furthermore, 
the  interval  may  be  one-sided  or  two-sided.  For  example,  if  the 

A 

estimate  is  N*120  for  the  case  n'-lOO,  we  can  see  from  Fig.  6.2  that 
the  5%  curve  intersects  the  N-120  line  at  N-145,  forming  a  95% 
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[ 

& 


one-tailed  interval.  In  other  words,  there  is  a  95%  probability 
that  the  true  number  of  initial  errors  is  below  145.  Similarly, 
the  probability  is  90%  that  N  is  below  138.  One  can  construct  a 
two-tailed  interval  by  considering  the  upper  and  the  lower  limits. 
Thus,  the  probability  is  90%  that  N  is  between  145  and  107.  Also, 
the  probability  is  75%  that  N  is  between  136  and  110.  The  two  tailed 
intervals  do  not  have  to  be  symmetric.  Consequently,  the  probability 
is  85%  that  N  is  between  110  <  N  <  145  or  107  <  N  <  136. 

A 

Note  that  in  the  actual  testing  N  is  known  and  therefore  we  need 

Ak 

the  percentile  curves  only  around  that  level  of  N.  This  reduces  the 
amount  of  work  considerably. 

This  method  is  suitable  for  estimates  with  one  unknown  parameter, 

/\ 

such  as  N  which  depends  only  on  N.  If  the  estimator  is  a  function 

A 

of  two  parameters,  as  in  the  case  for  T,  being  dependent  on  T  and 
N,  this  method  becomes  quite  complex  and  it  is  not  recommended. 


6.3  Effect  of  N  and  n  on  Estimator  Accuracy 
An  alternative  approach  for  describing  the  accuracy  of  the 
estimator  is  by  studying  the  effect  of  N,  the  initial  number  of 
errors,  and  n,  the  number  of  the  detected  errors,  on  the  accuracy  of 
the  estimator.  Here  again  we  rely  on  the  results  of  Appendix  D  which 


shows  that  the  estimator  N  is  independent  of  the  parameter  <j>. 

The  first  step  in  the  evaluation  of  the  accuracy  is  to  define  an 
error  term,  E,  which  describes  the  inaccuracy  of  the  estimate.  Here 
we  select  E  as 


E  -  [Expected  value  of 


(N-N)2] 


1/2 


(6-11) 


.1 


5  10  15  20  25  30  35  40  45  50  N-n 

Fig.  6.3  The  estimation  error  as  a  function  of  N,  the  initial 

number  of  errors,  and  the  number  of  the  remaining  errors, 
N-n. 


1  r 
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curves  of  constant  estimate  errors  as  functions  of  N  and  N-n.  This 
is  shown  in  Fig.  6.4. 

Figure  6.3  indicates  clearly  that  there  is  a  "knee"  in  the  error 
curve  for  values  of  N,  beyond  which  the  estimate  error  becomes  very 
large.  This  knee  occurs  at  N-n*0  for  the  curve  N*75  and  moves  to 
N-n»25  for  N-150.  This  information  may  be  useful  in  evaluating  the 
quality  of  the  estimate.  For  example,  let  the  number  of  the  detected 

A 

errors,  n=60  and  the  estimate  N  equals  100.  If  we  assume  that  the 
accuracy  of  the  estimate  is  good  and  therefore  N  *  N,  the  resulting 
N-n  will  be  approximately  40,  and  from  Fig.  6.3  we  see  that  for  this 
condition  the  error  is  very  large,  and  the  estimate  is  of  no  practical 
value.  On  the  other  hand,  if  the  number  of  the  detected  errors  is 

A 

n=145  and  the  estimate  is  N-150,  the  accuracy  of  the  result  is  probably 
good. 

Figure  6.4  reveals  another  interesting  point.  It  shows  that  as 
N  increases,  the  accuracy  of  the  estimator  improves  significantly, 
even  if  the  number  of  remaining  errors,  N-n,  remains  the  same. 

The  results  of  this  section  confirm  the  intuitive  feeling  that 
the  estimator  accuracy  improves  as  N  increases  or  as  N-n  decreases. 

But  it  goes  beyond  that  by  providing  a  quantitative  measure  for  E, 
as  shown  in  Fig.  6.3. 
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7.0  Conclusions 


Several  methods  were  developed  for  estimating  N,  the  number  of 
errors  in  a  program,  and  T,  the  MTTF.  The  estimators  form  two  groups: 
The  Standard  estimators  and  the  Geometric  ones.  The  Standard  estimators 
determine  both  N  and  T  whereas  the  Geometric  estimators  can  evaluate 
only  T.  The  various  estimators  were  tested  and  evaluated.  It  was 
found  that  the  various  estimators  are  strongly  correlated  but  they 
differ  enough  to  justify  generating  all  of  them. 

The  estimators  were  later  modified  to  fit  the  case  where  errors 
correction  is  delayed,  as  may  be  required  in  our  case.  Test  results 
indicate  essentially  the  same  behavior  as  in  the  case  of  Instant 
Correction. 

In  addition  to  estimating  N  and  T,  we  can  learn  about  the  accuracy 
of  the  estimates.  This  can  be  done  by  constructing  confidence  intervals 
by  one  of  the  two  methods  suggested  in  this  study  or  by  observing  the 
effects  of  N  and  n  on  the  estimate  accuracy,  as  discussed  in  Section  6. 


Appendix  A-Derivation  of  Model  Equations 
A.  1  Max~>™»n  Likelihood 
The  probability  density  of  x;j.  is 


f(xt)  -  4>(ll-i+l)  e"^(N"1+1)  Xi  i  =  1,  2  .  .  . 

The  likelihood,  L  is 

L(xA  .  .  .  V  -  S  f (x,)  -  S  ff-i+U 

i=l  i=l 

In  order  to  maximize  the  likelihood,  we  may  maximize  ln(L) . 

n 

In  L(x, .  .  .  x  )  =  E  [ln<t>  +  In  (N-i+1)  -  <J»(N-i+l)  x  ] 

1  n  i-1 


(Al) 


(A2) 


n  n 

n  ln<{>  +  E  In  (N—  i+l)-<J>  Z  (N-i+1)  x. 
i-1  i-1 


(A3) 


Now  require: 


51nL 

3N 


—  4>  E  x  =  0 


i-1  N*i+1  W1 


4>lnL  n  _  ?  (N-i+1)  x.  -  0 

3<j)  <j> 


(A4) 

(A5) 


Rewrite  (A4)  as 


4>  - 


n 


i-1 


N-i+1 


n 

E  x 
i-1 


(A6) 


Now  substitute  (A6)  in  (A5)  and  rearrange 


n 


A- 


i-l 


N-i+1 


S  (i-1)  x. 


N  - 


i=l 


n 

Z  x 
i-1 


(A7) 


(A7)  and  (A6)  will  be  used  for  estimating  N  and  <p  respectively 

A. 2  Geometric  Maximum  Likelihood  Model 

The  probability  density  function  of  xi  is  given  by 

f(x.)  -  X  a1  e"  ^ai 
i  o 


(A8) 


The  likelihood  function  is 

L(x  .  .  .  x  )  -  n  X  a1  e_Xoai  xi 

-*•  a  -  O 


i-1 


(A 9) 


In  L(x^  .  .  .  xr)  -  I  [in  XQ  +  i  In  a  -  XQa*  x^] 


n 


n  In  XQ  +5_Ls±iI  lQ  a  _  Z^  ai 


o  i-l  xi 


(A10) 


In  order  to  maximize  the  likelihood,  require 


9ln  L  n(n+l)  “  i-1 

3a  2a  "  X°  ^  1  a  -  0 


(All) 


31n  L  n  “  i 

aA  1  lax.  -  0 

3  o  \>  i-1  1 


(A12) 


Rewrite  (A12)  as 


Z  a* 
i-1 


(A13) 


A- 3 


And  substitute  (A13)  In  (All)  to  form: 


D  J  i-1 

/  n  L  i  a  x. 

n(n-H)  _  i-i _ 1 

7  a  n 


Z  a  x 


Solve  (A14)  for  a  and  (A13)  for  \q. 


A. 3  Least  Square  -  x  Model 


The  error  to  be  minimized  is 


(A14) 


E  *  Z  (xi  “  ^(N-i+l)^ 


(A15) 


Require 


3E  y  r,  _ 1  _ v  1 

3*'  i-i  1  ’  *(,,-1+1)  *2  (ii-i+i) ' 


(A16) 


2  "  [(Xj^  -  A/fl-i+l))  2^ 

imL  ^  1  i;  4>(N-i+l) 


(A17) 


Rewrite  (Alb  and  (A17)  as 


“  <N"'+1)  '  ^  (N-i+1) 2 

i-1  i-1  u  ; 


(A18) 


*  £ - H- 

.  , (N-i+1)  “ 


(N-i+1)' 


(A19) 


Express  4>  as 


A-4 


n 

Z 


♦  - 


l,j  (N-i+1) 
n  x4 
Z 


1 


1-1 


N-i+1 


and  substitute  (A20)  in  (A19)  to  form 


n 

Z 


n 

Z 


i.1  (N-i+1)2  “ml  (N-i+1)2 


n 

Z 


(N-i+1) 


n 

Z 


i-i  i-i 

Next,  solve  (A21)  for  N  and  determine  $  from  (A20) 


A. 4  Least  Square  t  Model 
The  error  here  is 
n  n  i 

1  L  (ti  "  sL  <^(N-j+l)) 

i-1  i“l  2 


n 


where  t.  -  Z  x 

j-1  J 


In  order  to  minimize  E,  require: 


“  2  S  t(ti  -  J  ♦0H+1))  *  4>(N-j+l)2  1 

i-1  J  J 


-  0 


3E 

94> 


n  i  1 

i-1  J 


■]  -  0 


Rewrite  (A24)  and  (A25)  as 


+  E  [ti  E  ,N  , . «v2  1  "  1  1  1  (N-j+1) 

i-1  j-l(N'J+1)  i-1  j-1 


E  "(N-J+l)^ 1  "  ° 
j-1 


(A20) 


0  (A21) 


(A22) 


(A23) 


(A24) 


(A25) 


(A26) 


A- 5 


n  i 


n  i 


>  E[ti  1  N-j+l]  “  1  1  N‘3+1 

i-1  j-1  i-1  J 


2 

]  -  0 


(A27) 


Now  express  $  as 


n  i  .2 

^  i-i  i-i  v _ 

n  1  i 

1  (ti  1  N=W 
i-1  1  j-1  ^J+1 

And  substitute  (A28)  in  (A26)  to  form: 
n  i  n  i  2 

1  U±  Z  (n-i+D2>  Z  (  E  (N"J+1)> 
i-1  j.iCN-J+D  ±ml  iml 


(A28) 


n  i  n  i  i 

'  1  (ti  1  N=j+I)'  E(I  (N-j+1)  Z  (N-j+1)^  "  ° 
i-1  j-1  i-1  j-1  j-lCN  3+1) 


(A29) 


In  order  to  simplify  the  expressions  define 


*  _  v  _ ± _ 

Ai  N-j+1 


(A30) 


B  -  E  - i  J 

j-1  (N-j+1) 


(A31) 


Then  we  may  write  (A29)  as 


E  (t1Bi)  E  A^ 


E  ^A  E  AiBi  -  0 
i-1  i-1 


(A32) 


A-6 


Also ,  (A28)  will  become: 


♦  “ 


E  A, 
i-1  1 


n 

E  t  A 
i-1 


(A33) 


Eq.  (A32)  is  solved  first  for  N,  and  (A33)  is  used  to  derive  <p. 

A. 5  Geometric  Least  Square  x  Model 
The  estimation  error  is 


E  -  E  (x - 7) 

i-i  1  V 


(A34) 


and  the  objective  is  to  determine  X^  and  a  which  minimize  E* 
Require 


2- 2  £‘'‘i-r?\->’  -° 

i-1  o  o 


(A35) 


3X  “  2  Z  [(xi  "  x  1  i}  ’  x2  i1  "  °' 
o  .  -  X  a  A  a 

i-1  o  o 

Rewrite  (A35)  and  (A36)  as 


(A36) 


Q  i  Xj  n  < 

X  E  — r1  -  E  — — rr  -  0 

0  i-1  a1  i-1  a2i 


(A37) 


n  xt  n  , 

X  E  -j  -  E  — rr  -  0 

o  4i  i-la2i 


(A38) 


i-1  aJ 


Now,  express  Xq  as 


(AA5) 


A- 8 


Define  the  following  functions 


Ci"  E 
1  j-1  aJ 


i  < 

n  -  Z  -J- 
1  j-1  aJ 


and  re-write  (A44)  and  (A45) 


X  E  tjD^  ~  E  “  0 

°  i-1  i-1 


*  E  tiCi  -  E  C±  -  0 


Now,  derive  XQ  from  (A49) 


Xo  n 


E  Ci 

i-1 


E  tjC^ 

i-1 

and  substitute  (A50)  in  (A48) 


n  n  _  n  n 

E  tiDi  •  E  C*  -  E  tiC±  I  ciDi  -  0 


A. 7  Exponential  Least  Square  Model 
The  estimation  error  here  is 


_  “  .  .  1  ,  ,N-i* ,2 

"  i-1 Ui  ♦  (~)] 


(A46) 


(A47) 


(A48) 


(A49) 


(A50) 


(A51) 


(A52) 


In  order  to  minimize  E,  require: 


3E  «  r  r/  .  1  ,  N-i.  ,-l  .  ,N-1... 

"  2  E  +  *  lQ  “T)(^2  ‘  ln 
1-1 


3<t> 


-  0. 


3E  _  _  -  ..  1  ln  N=iv  1  1  1  _  0 

3N  2  E  t(ti  +  <()  lQ  N  )  <f>  N(N-i)  °’ 
1-1 

Rearrange  (A3 3)  as 


n  N-i  °  2  N-i 

<t>  E  t.  ln  (V)  +  E  In  (^)  -  0 

1-1  1  N  1-1  N 


Then 


~  .  2  ,N-1. 

"  E  ln  (T) 

1-1  M 

11  N-i 

E  Ci  111 
1-1  1 


Sub.  (A56)  in  (A54) 


n  1  t,  n 


“  "1  “  -  2  .N-i,  ;  „  .  .NHL.  .  _  _i_ 

1-1  N_i  1-1  N  1-1  1  N  1-1  N1 


i  /N-l 
(— 


Eq.  (A57)  is  solved  £or  N  and  4>  is  determined  from  (A56). 


Appendix  B-- Derivation  of  Model  Equations  for 
Piecewise  Constant  Hazard 


B-l 


The  equations  for  the  modified  models  of  Section  4  are  derived  in 
this  appendix. 

B.l  Maximum  Likelinood  Model 

The  probability  density  function  for  x.^  is 

f(xt)  -  <J>Nj  e"^NJXi  (Bl) 

where  j  is  the  index  of  the  tape  on  which  the  ith  error  was  discovered. 
The  likelihood  function  is 

L(x.x_  ...  x  )  -  II  e  Nl^Xi....  H  Nk(J>  e  N't<^xi  (B2) 

1  1  n  ieii  i£Ik 

k 

In  L  (x.x,  .  .  .  x  )  ■  E  E  (ln<J>  +  In  N.  -  <J>N.  x. )  (B3) 

12  “  J-l  Idj  J  J  1 

In  order  to  maximize  the  likelihood  require: 


3  InL 
3  N 


k 

E 

j-l 


ieiH 


Nj 


0 


or 


k 

E 

j-l 


H- 

Nj 


n 

<t>  E  x. 
i-1 


0 


Similarly  require 


3  InL 
3<t> 


k 

E 

j-l 


1 

ielj  * 


Nj  Xi)  -  0 


(B4) 


(B5) 


(B6) 


L 


n  -  $  E  N,  E  x.-O 
j-1  J  ieij 


From  (B5)  we  obtain 

E  SL 

♦  - 

n 

E  xt 
i-1 

Then  substitute  (B8)  in  (B7)  to  form: 


n 

n  Z  x. 


E  N.i  E  x. 
1-1  i«lj 


-  E  i 

1-1  "l 


Then  N  is  the  solution  of  (B9)  and  <p  is  derived  from  (B8) 


B.2  Geometric  Maximum — Likelihood  Model 


Consider  first  the  derivation  of  model  I. 


z.  ■  X  aJ 
j  o 


(BIO) 


The  likelihood  function  is 


L(xxx2.  . 


x J  -  n  X  ae'Ao“i  n  xoa2  .  .  n  XDak  e'^i 


In  L(x,  .  .  .  x_)  -  E  E  In  X^  +  j  In  a  -  XQaJ 


j-1  i£Ij 


«  n  In  X„  +  In  a  E  jn.,  -  X  E  aJ  Ex 
j-1  j  °  j-1  i£Ij 


(Bll) 


B-3 


Now  require: 


3  In  L _ n__  r  „j  y  v 

— 5X - T  L  a  i 

°  °  j-l  ieij 


-  0 


CB12) 


f-^  -  I  E  j  n-j  -  XQ  E  j  a 


j-i 


j-l 


j-i 


E  x  -  0 
ieij 


(B13) 


Combining  (B12)  and  (B13)  gives  the  estimate  for  a  as  the  solution  of 


k 


n  l  j  aJ  E  x  k 

- E  aj  E  x. 


(B14) 


k 

E  j  nj 
j-l 


j-l  ieij 


and  the  estimate  of  \Q  is  found  from  (B12) 


X  -  — r 
o  k 


(B15) 


E  a-^  E  x± 
j-l  ieij 

For  model  II  the  hazard  function  is 


J  ° 


X  aMJ 


(B16) 


where 


Mj  -  ni  +  n2  +  •  •  *  nj-i 

The  likelihood  here  is 


L(x-  •  •  x  ) 
1  n 


n  X  e_x°Xi 
iell  ° 


n  XD  aMk 
ielk 


(B17) 


(B18) 


In  L  (xi  .  .  xn)  *  I  E  (lnXQ  +  Mj  In  a  -  XQa^J  x^) 

J-l  ielj 


k  k  Mi 

In  X  +  In  a  E  n.M-i  -  X  E  a  J  E  x,i 

0  j-i  1 3  °  j-i  id,  1 


(B19) 


Require 


3  InL  _  n  r  M* 

■TY -  -  t - E  a  J  E  x.  -  0 

o  Xo  j-l  ielj  1 


(B20) 


9  InL  ^  1 
3  a  a 


k  k  , 

E  n.M.  -  E  M4  a  J  A  E  X±  =  0 
j-l  J  1  °  J-l  J  id. 


(B21) 


Combine  (B20)  with  (B21)  to  form: 


n  E  (Mj  aMJ  E  xj.) 


E  ni  M., 
j-l  J 


-  E  a  J  E  x 
J-l  ielj 


(B22) 


Eq.  (B22)  is  solved  for  a.  Later,  X  is  found  from  (B20) 


Xo  “  ~k 


l  (a>Ij  iL  xi} 


(B23) 


B-5 


B.3  Least  Square  x  Model 


The  estimation  error  to  be  minimized  is 


E  '  ,£,  <*i  -  *i>2  "  £  £  (xi  ‘  a7*)2 

1=1  J  =  1  l£I^ 


Note  that  N.  =  N-n,- 
J  1 


'  -  Vl 


Therefore  =  1 


Now  require 


e~0]  -  0 

4>Nj2 


H  =  2  E  [(xi-^->  =  ° 

v  j=l  i£Ij  v  j  4>N.. 


Rewrite  (B27)  and  (B28)  as 


k  n 


<J>  £  (—4  E  -  E  -4-0 

j-1  Nj  ieij  1  j=l  Nj3 


*  *  1  V 

j-l  j  ielj 


k  n. 

E  ~i 

j-l  n/ 


(B24) 

(B25) 

(B26) 

(B27) 

(B28) 

(B29) 

(B30) 


Now,  combine  (B-29)  and  (B-30)  to  form: 


B-7 


m 


1  1  n  1  i  2 

4>  E  (t.  E  tA-)  -  S  (  E 

j _ i  ^ 


,  TJ  '  ”  v  ”  N 

i=l  A  m«l  l\m)  i-1  m«l  "(n) 


-  0 


(B37) 


(B-36)  and  (B-37)  can  be  simplified  by  using  the  notation 


Ai-  ^ 

1  m-1  N(m) 


(B38) 


Bi  -  E  -T 

A  m=l  N 


(m) 


(B39) 


This  allows  writing  (B36)  and  (B37)  as 


n  n 

<J)  E  t^B^  —  E  A^Bj_  =  0. 

i=l  i=l 


(B40) 


<p  X  tjA^  -  E  Aj_  =  0. 

i=l  i=l 


(B41) 


(B40)  and  (B41)  may  be  combined  to  form 


n  n 

2 


E  tjBi  E  Y  -  E  tjAi  E  A.Bj 

l.3!  i— 1  ial  i5*! 


(B42) 


and 


S  V 

i-1  1 

n 

E  t-jAi 

i=l 


(B43) 


Note  that  (B42)  and  (B43)  are  identical  to  (A32)  and  (A33) ,  except 


that  and  B^  are  defined  slightly  differently. 


B.5  Geometric  Least  Square  x  Model 


The  estimation  error  here  is 


B-8 


E  -  E  (xt  -  x^2  -  Z  (xA  -  ~  ) 


i-1 


i-1 


Aq  a  (i) 


(B44) 


To  minimize  E,  require: 


H-  1  2«*i- 

i-1 


M 


-)(- 


(i) 


^aM(i)'AoaM(i)+1 


)]  -  0 


(B45) 


—  -  E  2[ (x.  -  1 


3A 


)(- 


i-1 


1  A  nMU)  V  >lU) 


-)]  -  0 


(B46) 


Rewrite  the  above  as 


Xi  ^(i) 

Z  —  -W  -  E  — W  .  o 

M(i)  2M fl) 

i-1  a  i-1  a  (  ; 


(B47) 


i-1 


M(i) 


-  -  E 
i-1 


2M(i) 


(B48) 


Combine  (B47)  and  (B48)  to  form 


E  *1  'll) 


l-l 


M 


(i) 


E 

i-1 


a2M(i)  i-1 


M 


M 


iO 


(1)  i-1  a2M(i) 


(B49) 


Th«  solution  of  (B49)  gives  a,  the  best  estimate  of  a.  A  is  found 


r.>»  ( B-.8)  to  be 


1-1  a  "(i> 


B.6  Geometric  Least  Square  t  Model 
The  estimation  error  to  be  minimized  is 


n  . 

i  2 

E  -  Z  (t  -  E  - ) 

,  .  1  m-1  X  a*1^) 

1-1  o 


Require: 


|f-  E  [2(t±  -  Z 

i-1  Ao  X  a 


£  l2(t*  -  Jx  rfa '  \  rt?’ 

i»l  o  m-1  o 


Rewrite  the  above  equations  as: 


1  M,  “1 

X  '  E  (t .  E  — i21)  -  E  (  E  -r±-~  • 

°  1  m-1  aM(m)  m-1  aM<m> 

1-1  1-1 


1 

E  - 

m-1  a 


‘o  E(ti  E1-^)) 
i-1  m-1  a  vm/ 


1  1  : 
E  (  E  — i— ) 

m-1  a 


Define 


B-10 


A,  -  E 


m-1  "On) 
a 


(B56) 


B  -  E 
1  m-1  aM<«> 


(B57) 


Then  we  can  rewrite  (B54)  and  (B55)  as 


X  I  CiBi  "  Z  Vi  *  ° 
o 

i-1  i-1 


(B58) 


X  E  t.A,  -  E  A/  -  0 
°i-l  11  i-1  i 


(B59) 


Then  a  can  be  found  from: 


n  _  n 

r*  .  ^  K* 


E  t.B.  E  Aj  -  E  E  A^B  *  0 

i-1  1  1  i-1  i-1  i-1 


(B60) 


and  Xq  is  found  from 


o  n 


E  t  A 
i-1 


(B61) 


C-l 


Appendix  C — Derivation  of  Var(N)  and  Var(T) 


Let  the  variables  x^,  x2 » 


have  a  probability  density 


function. 


f (x1>  x^,  ...  x^;  9^,  ®2’  *  *  ■ 

/\  A  /V 

If  6^,  0^,  .  .  .  Qn»  are  the  maximum  likelihood  estimators  of  0^, 

.  .  .  9q,  and  n  is  large,  then  0^,  02»  •  •  •  9^  are  approximately 

distributed  by  the  multivariate  normal  distribution  with  means 

0, ,  0„,  .  .  .  9  .  Moreover,  if  we  define  the  matrix  R  to  have  the 
1  l  n 

elements 


rij  *  '  Et30130j 


In  f(xx,  x2,  .  .  .  xn;  0, ,  07,  .  .  .  8^)  ]  (c-l) 


V  1*  2 


then  the  variance  matrix,  V,  equals 


V  -  R 


Next,  we  use  (c-l)  and  (c-2)  to  derive  Var(N)  and  Var(T). 


C. 1. _ Derivation  of  Var(N) 


Consider  the  probability  function 


f(x  ,  X2,  .  .  .  X  ;  N,4>)  »  ir  <{>  (N-i+1)  e-<J><N-i+1>xi 


(c-2) 


(c-3) 


In  f(x  ,  X-,  ...  x  j  N,4>)  -  n  ln<j>+  Z  ln(N-i+l)  -  <J)  I  (H-*-1)** 

i-1  i-1 


Next,  differentiate  (c-4)  in  order  to  obtain  the  r^  terms 

|m  .  s  ;  (s.1+l)  X1 


WKM 


C-2 


32lnf  -n 

2  "  2 
3<J>  <|> 


32laf 

3<t>,3N 


n 

Z 

i-1 


x 


i 


3  Inf 
3N 


n 

Z 

i-1 


1 

(N-i+1) 


$  Z  x. 
i-1 


32lnf  „  °  -1 

3  N2  i-1  (N-i+1)2 


Define  the  quantities  A  and  S  by 


n 

A  -  Z  xj[ 
i-1 


S 


n 

Z 

i-1 


1 

(N-i+1)2 


Then  we  can  express  R  by 


In  order  to  find  the  inverse  of  R,  note  that  the  determinant. 


Sn 

♦ 


A 


2 


(c-5) 

(c-6) 

(c-7) 

,  is 


(c-8) 


Then  the  inverse  matrix  is 


(c-9) 


We  are  interested  in  Var(N)  which  equals 


Var(N) 


fy  hS-aV 


(c-10) 


C.2  Derivation  of  Var(T) 


We  wish  to  express  the  probability  density  function  of  x^,  x2, 
xq,  in  terms  of  N  and  T,  where  T  is  the  mean  time  to  failure  after 
the  correction  of  n  errors.  This  is  done  by  noting  that 


<KN-n) 


(c-11) 


Then  we  may  write  the  probability  density  function  as 


f(xx,  x2, 


xn;  N,  T) 


(N-i+1) 


±ml  (N-n)  T 


(N-i+1) 
(N-n)  T 


(c-12) 


Therefore, 


In  f(x.,  x,...x  ;  N,T)  -  £  In  (N-i+1)  -  n  In  (N-n)  -  n  In  T 
l  x  n 


"  T  i-1  ^  '  (N-n)T  if1  (n_1+1)  xi 


(c-12) 


Recall  (c-5)  and  (c-6)  and  define 


C-4 


n 

B  -  E  (n-i+1)  x  (c-13) 

i-1  1 


Then  we  may  write  In  f(x^,  *'*xn»  N,T)  as 


n 

In  f(x  ,  x0,  ...  x  ;  N,T)  -  E  ln(N-i+l)  -  n  In  (N-n)  -  n  In  T 
1  *  n  .. 


A  _ B_ 

"  T  "  (N-n)T 

The  partial  derivatives  of  In  f (•)  are: 


32lnf  .  ”  -1  .  n  2B 

2  ^  2  +  2  ”  3 
3N  i-1  (N-i+1)  (N-n)  (N-n)  T 


32lnf 

3N,3T 


-N 

2  2 

(N-nr  1 


d2lnf  _n_  2A  2B 

2  =  2  -  3  3 

3T  T  T  (N-n)  T 


The  matrix  R  is 


n  +  2B 
(N-n)2  (N-n)3  T 


B 

(N-n)2  T2 


B 

2  2 
(N-n) L  TZ 

_n  .  2A  2B 

2  +  3  3 

1  TJ  (N-n)  TJ 


(c-14) 


(c-15) 


The  determinant  of  R  is 


C-5 


A  "  (S - +  -  2^---)  (Z2  +  JA  + - - ® 


(N-n)2  (N-n)3T/VT2  T3  (N-n)  T3/  (N-n)4  T4 


and  the  variance  matrix,  V,  Is 


V  -  R"1  -  i 


-n  .  2A 


2B 


-y  +  ■*?  +  - , 

*  TJ  (N-n)  T 


-N 


2  2 
(N-n)  T 


-B 


2  2 
(N-n) ^ 


S  ~  +  — ^T 

N-n  v  3 


(N-n)  T 


The  variance  of  T  is  therefore: 


Var(T)  -  j  (S - S— y  + - > 

(N-n)Z  (N-n) J  T 

The  expressions  (c-10)  and  (c-18)  are  used  to  determine  the 
confidence  intervals  in  Section  6. 


(c-16) 


(c-18) 


L(N)  and  H(N)  may  be  plotted  against  N  as  In  Fig.  D.l.  A  vertical  line 


<  a 


D-3 


through  any  chosen  value  of  N'  will  intersect  the  two  curves  in  points 

A  A 

which,  projected  on  the  N  axis,  will  give  limits  between  which  N  will 
fall  with  probability  0.90. 

A  A, 

Having  constructed  the  two  curves  N  ■  L(N)  and  N  -  H(N) ,  we 
may  construct  a  confidence  interval  for  N  as  follows:  On  the  basis 
of  the  sample  of  n  failures  compute  the  value  of  the  estimator, 
say  N'.  A  horizontal  line  through  the  point  N'  on  the  N  axis  (Fig.  D.l) 
will  intersect  the  two  curves  at  points  which  may  be  projected  on  the  N 
axis  and  labeled  and  N2>  as  in  the  figure.  These  two  numbers  de¬ 

fine  the  confidence  interval,  for  it  is  easily  shown  that 

P(NX  <  N  <  N2)  -  0.90  (D-4) 

In  order  to  clarify  this  point  suppose  that  the  number  of 
error  is  N' .  The  probability  that  the  estimate  will  fall  between 
L(N')  and  H(N')  is  0.90.  If  the  estimate  does  fall  between  these 
limits,  then  the  horizontal  line  will  cut  the  veritcal  line,  which 
goes  through  N* ,  at  some  point  between  the  curves,  and  the  corresponding 
interval  (N2  N^)  will  cover  N'.  If  the  estimate  does  not  fall  between 
L(N')  and  H(N'),  the  horizontal  line  does  not  cut  the  vertical  line 
between  the  curves,  and  the  corresponding  interval  (N2»  N^)  does  not 
cover  N* .  It  follows,  therefore,  that  the  probability  is  exactly 
0.90  that  an  interval  (N^,  N^)  constructed  by  this  method  will  cover  N' . 
This  is  true  for  any  value  of  N. 

It  is  possible  to  determine  the  limits  N2  and  for  a  given 
estimate  without  finding  the  curves  L(N)  and  H(N) .  Referring  to 
Fig.  D.l,  the  limits  for  N  are  the  points  N2  and  such  that 
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L(N^)  ■  N'  and  HC^)  ■  S'.  Thus,  Instead  of  finding  the  two  curves, 
we  may  solve  for  the  points  and  which  satisfy  these  conditions. 

In  order  to  apply  this  method  we  have  to  determine  the  proba- 

A 

bility  density  of  the  estimator  g(N;N).  Furthermore,  we  have  to  show 
first  that  g(N;N)  depends  only  on  N.  We  may  start  with  the  general 

A 

assumption  that  N  depends  on  all  the  system  parameters,  that  is, 
g(N;N,$,n) .  Since  n,  the  number  of  corrected  errors  is  known,  n  is  a 

A 

known  quantity  and  not  a  parameter.  Next,  we  have  to  show  that  N  is 

A 

independent  of  4> ,  in  order  to  reduce  g(N;N,<f>)  to  the  desired  form. 

A 

Suppose  that  Instead  of  estimating  N  from  x^,  %2  •  •  •  xn»  we 

estimate  it  from  a  new  sequence,  y, ,  y„,  .  .  .  y  ,  defined  as 

1  i.  n 

y±  =  ^  (D-5) 

Since  the  probability  density  function  of  xi  is 

f(x±)  =  (N-i+lH  e"(N_i+1^  xi  (D-6) 

The  probability  density  function  of  y^  can  be  found  from  (D-7) 

W  •  |H|  £x  <*i>  (o-7> 

This  is  found  to  be 

f(y±)  -  (N-i+1)  e‘(N-i+1>  (D-8) 

Thus,  the  new  random  variable,  y^,  is  normalized  such  that  it  is 
independent  of  <j>.  If  we  estimate  N  on  the  basis  of  the  y^  sequence, 
the  resulting  estimate  N  will  be  independent  of  <J>.  Recall  Eq.  (A. 7) 

A 

which  is  used  to  estimate  N. 
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n 

Z 

i-1 


1 

N-i+1 


Z  (i-1)  x± 


Now,  rewrite  (D-9)  as 


(D-9) 


i=l  N-i+1 


N- 


Z  (i-1)  y. 
i-1 _ _ 


Z 

i-1 


*i 


(D-10) 


Note  that  (D-10)  describes  N  as  a  function  of  the  y’s,  and  therefore, 

^  A 

N  is  independent  of  <p.  Thus,  we  have  established  that  N  is  a  function 
of  the  parameter  N  and  the  known  quantity  n.  Therefore,  we  may  write 
the  density  of  N  as  g(N;N).  In  order  to  construct  g(N;N),  it  is 
realized  that  an  analytical  derivation  is  impossible,  and  therefore, 

A 

a  simulation  is  used  to  find  g(N;N).  This  is  done  by  generating 
1000  sequences  of  variables  with  the  desired  probability  density 
function  with  N  being  set  to  some  fixed  value  N' ,  and  n  is  given. 

For  each  sequence  we  evaluate  $,  and  we  end  up  with  1000  estimates 
of  N.  The  histogram  of  N  is  a  numerical  approximation  for  g(N;N'). 
Note  that  this  only  gives  g(N,N')  for  one  point,  N-N'.  However,  we 

A 

need  to  find  g(N;N)  for  only  a  few  points,  as  explained  above. 

The  histograms  were  generated  for  various  values  of  4>»  and 
it  was  found,  as  expected  from  theory,  that  the  histograms,  and 
g(N;N)  are  independent  of 
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The  method  described  above  can  be  modified  easily  to  the  case  where 
a  one-sided  interval  is  needed.  In  that  case  we  only  have  to  con¬ 
struct  L(N)  and  determine  from  it  N^.  The  confidence  interval  for 
this  case  is  n<N<N^. 

This  method  can  be  extended  to  the  case  where  the  estimator  is 

A  A 

a  function  of  two  parameters,  such  as  for  T,  g(T;  T,N).  However, 
the  construction  of  the  histograms  with  two  parameters  becomes  very 
complex  and  therefore  this  approach  is  abandoned.  On  the  other  hand, 

A 

it  is  impossible  to  express  T  as  a  function  of  a  single  parameter, 

A  A 

g(T;T).  This  made  the  present  method  attractive  only  for  N,  whereas 

A 

the  confidence  intervals  for  T  are  determined  by  the  method  of  Section 
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Appendix  E — Error  Data  Collection  Format 

Three  separate  tables  are  needed  for  adequate  cross  correlation 
and  configuration  control  of  the  error  data. 


Table  //I  will  provide  information  concerning  the  test  runs  (where  run  is 

defined  as  being  the  execution  or  attempted  execution  of  a  specific  test 

case).  It  is  assumed  that  the  entire  OFP  will  be  in  residence  in  the 

FCC  computer  during  the  execution  of  each  test  run.  It  is  also  assumed 
»  . 
that  all  test  runs  (including  those  which  discovered  no  errors)  will  be 

listed  here.  This  is  critical  as  the  model  will  be  attempting  to  calculate 

an  MTBF.  The  format  is  as  follows: 


Run  if  Calendar  Date  j 
of  Run 

i 

i 


i 

i 


Time  of  Day  j  OFP  Type  j  Run  Length  Short  description 
of  Run  Configuration  if  \  (Sec)  ’  of  test  case. 

:  E.G.,  Bus  control 
j  j  '  component,  verify 

i  j  1  transmission  word 

count. 


i 


Table  i'-2  will 
trble  follows: 


:ape  configuration  information. 


.  :t  tor  t.iis 


Tape  Configuration  if 


I  Calendar  Date  it  Replaced 
|  Previous  Tape 


List  of  Changes 
from  Previous 
Tape  Configuration 
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Table  If 3  niovM' s  i!.e  Information  on  sofrva re  errors  di -.'ow rod  during  tbe 
tost  frig.  This  c b  1  o  requests  the  execution  tine  at  which  an  •  rvor  occurred. 

This  means  that  recorded  data  used  to  search  for  an  error  will  have  to  be 
tine  correlated  to  the  FCC  execution  (preferably  to  within  a  major  cycle). 
This  is  intended  to  be  the  execution  time  for  the  error  source  rather  than 
the  error  symptom.  For  example,  if  an  incorrect  display  is  discovered,  we 
wane  to  know  the  execution  time  at  which  the  parameter  being  displayed  was 
Incorrectly  calculated  (or  output  or  formatted  etc.)  rather  than  the  time 
at  which  the  faulty  display  was  noticed.  This  table  also  requests  that  the 
source  component  be  Identified..  Here  again  we  are  not  interested  in 
symptoms.  If  a  single  symptom  is  caused  by  several  sources,  each  source 
should  be  listed  as  a  separate  error.  It  should  also  be  noted  that  a 
single  source  may  cause  several  symptoms.  In  this  case  we  are  interested 
in  only  the  source.  Thus  the  errors  recorded  in  this  table  are  not  neces¬ 
sarily  synonymous  with  anomaly  reports.  The  above  requests  will  require 
considerable  analysis  of  discovered  anomalies.  If  this  analysis  is  out 
of  your  scope,  please  so  inform  us.  The  format  for  Table  If 3  is  as  follows: 


Error#  Run  i  Execution  time  j  Source  Component  Tape  Anomaly  Error 

of  Error  of  Error  Configura-  Report  Cat-* 

Occurrance  tion  No. 


NOTES : 

1.  This  is  the  run  If  in  error  was  first  discovered. 

2.  This  refers  to  anomaly  reports  for  which  this  error  is  a  source 


(may  be  more  than  a  single  report). 

3.  One  of  the  following  categories: 


a.  Computational;  e.g.,  index,  equation,  sign  convention,  modeling. 


mixed  mode,  truncation,  rounding,  units,  convergence,  etc. 
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b.  T;*g!cal;  >*.g.,  limit  d  i-  ’ot_,ic  branch,  loo?  exit, 

missing  condition,  flag,  itor.aio.i  >'  p  ':e,  storage  reference,  endless 

«oop )  C  t l i 

c.  I/O;  e.g-,  missing  I/O,  garbed  I/O,  wrong  field  size,  format, 
control,  discrete  usage,  etc. 

d.  Data  handling;  e.g.,  data  lost,  write  or  read  to  wrong  location, 
number  of  entries,  index  or  flag  modification,  bit  manipulation,  number 
type  conversion,  subscripting,  bounds,  etc. 

e.  Configuration;  e.g.,  compilation,  segmentation,  illegal 
instruction,  etc. 

f.  Routine/routine  interface;  e.g.,  pass  wrong  parameters,  expect 
wrong  parameters,  communicate  with  wrong  data  block,  calling  sequence,  etc. 

g.  User  interface;  e.g.,  data  read  but  not  used,  data  rejected  but  used, 
valid  data  rejected,  incorrect  rode  change,  etc. 

h.  Data  base;  e.g.,  uncoord inated  use  of  data  elements,  incorrect 


initialization,  missing  data,  wrong  location,  etc. 

i.  Requirements  compliance;  e.g.,  duty  cycle  violated,  specified 


accuracy  not  met,  specified  timing  not  met. 
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